Statistical Machine Translation of European Parliamentary Speeches
نویسندگان
چکیده
In this paper we present the ongoing work at RWTH Aachen University for building a speechto-speech translation system within the TCStar project. The corpus we work on consists of parliamentary speeches held in the European Plenary Sessions. To our knowledge, this is the first project that focuses on speech-to-speech translation applied to a real-life task. We describe the statistical approach used in the development of our system and analyze its performance under different conditions: dealing with syntactically correct input, dealing with the exact transcription of speech and dealing with the (noisy) output of an automatic speech recognition system. Experimental results show that our system is able to perform adequately in each of these conditions. Paper type: (R) Research
منابع مشابه
Open Domain Speech Recognition & Translation: Lectures and Speeches
For years speech translation has focused on the recognition and translation of discourses in limited domains, such as hotel reservations or scheduling tasks. Only recently research projects have been started to tackle the problem of open domain speech recognition and translation of complex tasks such as lectures and speeches. In this paper we present the on-going work at our laboratory in open ...
متن کاملThe IRST English-Spanish translation system for european parliament speeches
This paper presents the spoken language translation system developed at FBK-irst during the TC-STAR project. The system integrates automatic speech recognition with machine translation through the use of confusion networks, which permit to represent a huge number of transcription hypotheses generated by the speech recognizer. Confusion networks are efficiently decoded by a statistical machine t...
متن کاملThe IBM 2006 Speech Transcription System for European Parliamentary Speeches
TC-STAR is an European Union funded speech to speech translation project to transcribe, translate and synthesize European Parliamentary Plenary Speeches (EPPS). This paper describes IBM’s English and Spanish speech recognition systems submitted to the TC-STAR 2006 Evaluation. The technical advances in this submission include two different algorithms for automatic segmentation and speaker cluste...
متن کاملA Post-processing Approach to Statistical Word Alignment Reflecting Alignment Tendency between Part-of-speeches
Statistical word alignment often suffers from data sparseness. Part-of-speeches are often incorporated in NLP tasks to reduce data sparseness. In this paper, we attempt to mitigate such problem by reflecting alignment tendency between part-of-speeches to statistical word alignment. Because our approach does not rely on any language-dependent knowledge, it is very simple and purely statistic to ...
متن کاملOpen Domain Speech Translation: From Seminars and Speeches to Lectures
This paper describes our ongoing work in open domain speech translation. We describe how we developed a lecture translation system by moving from speech translation of European Parliament Plenary Sessions and seminar talks to the open domain of lectures. We started with our speech recognition and statistical machine translation 2006 evaluation systems developed within the framework of TC-Star (...
متن کامل